A Multivariate Two-Sample Test using the Jaccard Distance

نویسنده

  • Stuart B. Heinrich
چکیده

A common need in statistics is to assess whether two samples come from the same underlying population distribution. Existing two-sample tests often make limiting a priori assumptions, or cannot be easily generalized to multivariate data. We derive a new multivariate two-sample test that makes no a priori assumptions, has higher statistical power than previous tests, has better runtime performance, has an easily understood geometrical interpretation, and is simple to implement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Jaccard distance based weighted sparse representation for coarse-to-fine plant species recognition

Leaf based plant species recognition plays an important role in ecological protection, however its application to large and modern leaf databases has been a long-standing obstacle due to the computational cost and feasibility. Recognizing such limitations, we propose a Jaccard distance based sparse representation (JDSR) method which adopts a two-stage, coarse to fine strategy for plant species ...

متن کامل

Hypothesis testing of genetic similarity based on RAPD data using Mantel tests and model matrices

Clustering and ordination procedures in multivariate analyses have been widely used to describe patterns of genetic distances. However, in some cases, such as when dealing with Jaccard coefficients based on RAPD data, these techniques may fail to represent genetic distances because of the high dimensionality of the genetic distances caused by stochastic variation in DNA fragments among the unit...

متن کامل

Combining Mahalanobis and Jaccard Distance to Overcome Similarity Measurement Constriction on Geometrical Shapes

In this study Jaccard Distance was performed by measuring the asymmetric information on binary variable and the comparison between vectors component. It compared two objects and notified the degree of similarity of these objects. After thorough preprocessing tasks; like translation, rotation, invariance scale content and noise resistance done onto the hand sketch object, Jaccard distance still ...

متن کامل

Multivariate Stream Data Classification Using Simple Text Classifiers

We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to clas...

متن کامل

A Test of Homogeneity for Two Multivariate Populations

The classical tests of homogeneity, such as the twosample Kolmogorov-Smirnov test, do not have a natural extension to comparing two multivariate populations. G. J. Székely and N. K. Bakirov have proposed a new test based on Euclidean distance between sample elements. This test can be applied to testing homogeneity of any two multivariate populations with finite second moments, and the test is r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012